-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL] Add bfloat16 utils based on libdevice bfloat16 support. #7503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ort. Signed-off-by: jinge90 <ge.jin@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know somebody who is familiar with math/half/bfloat/etc to review this instead of me? I'm afraid I lack knowledge in the area.
// Need to ensure that sycl bfloat16 defined in bfloat16.hpp is compatible | ||
// with uint16_t in layout. | ||
#if __cplusplus >= 201703L | ||
static_assert(sizeof(sycl_bfloat16) == sizeof(_iml_bfloat16_internal), | ||
"sycl bfloat16 is not compatible with _iml_bfloat16_internal."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a distinct alias at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @aelovikov-intel
What distinct alias do you refer to?
Thanks very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we use the same type in all places? Why do we need both sycl_bfloat16
and _iml_bfloat16_internal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @aelovikov-intel
The functions defined in sycl::ext::intel::math:: namespace such as hge, hgt... are c++ wrappers for c functions provided in libdevice. All these c++ functions will call corresponding c functions and sycl bfloat16 users will only work with these c++ functions. sycl bfloat16 is a c++ class defined in https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/ext/oneapi/bfloat16.hpp and its current implementation is based on "uint16_t" type. So, our c++ functions such as hge, hgt... can only accept user-visible sycl bfloat16. However, the c functions implemented in libdevice can't use c++ sycl bfloat16 type, they can only accept native C types , so we need to use native C uint16_t type for them. In the future, we may use native bfloat16 type instead emulation based on uint16_t.
Thanks very much.
Hi, @aelovikov-intel |
Signed-off-by: jinge90 <ge.jin@intel.com>
Some of the comparison functions appear to be duplicates of those already defined in the bfloat16 class: https://github.com/intel/llvm/blob/37b05f03cb91b1ee04efe41bd5895a9476a2e793/sycl/include/sycl/ext/oneapi/bfloat16.hpp Also some of the math functions: max/min for example, are duplicates of those already added in this oneapi extension: https://github.com/intel/llvm/blob/37b05f03cb91b1ee04efe41bd5895a9476a2e793/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16_math_functions.asciidoc I think that the idea was to add to these existing implementations, with Intel builtins or generic implementations if they don't exist, and then add more math functions progressively, mainly through generic implementations that do the operations using float as here. Generally the code looks target agnostic: can this not be added to the oneapi extension directly, rather than the Intel namespace? |
Hi, @JackAKirk |
Hi, @aelovikov-intel |
Thanks for the explanation. For the Note that we planned on adding many more bfloat16 math functions to https://github.com/intel/llvm/blob/37b05f03cb91b1ee04efe41bd5895a9476a2e793/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16_math_functions.asciidoc eventually. |
Hi, @JackAKirk
"Calculates nv_bfloat16 max(a, b) defined as (a > b) ? a : b. Thanks very much. |
That was my mistake: I wrote the SYCL definition and I should have added this missing information that is the sole discrepancy between the definitions. However the key point is that these bfloat16 functions , Note also that we have a similar situation in this extension: #7397, where we adopted the cuda naming of Whatever the solution we should probably decide on a standard set of naming conventions to avoid duplications and confusion. We also planned to introduce e.g. |
btw I'm curious whether there is a reason to prioritize all the bfloat16 comparison functions ahead of the math functions: https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH____BFLOAT16__FUNCTIONS.html#group__CUDA__MATH____BFLOAT16__FUNCTIONS |
Hi, @JackAKirk |
Hi, @JackAKirk Thanks very much. |
All the bfloat16 math functions mentioned in the extension doc are already implemented in the ext_oneapi_cuda backend: https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/ext/oneapi/experimental/bfloat16_math.hpp . They include min and max. If we can resolve the naming issues it would be good to add your general implementations for other backends to the existing bfloat16_math.hpp header. |
It was decided that __nv_bfloat162 should map to sycl::marray<bfloat16, 2>: note that the math functions are also supported by the general sycl::marray<bfloat16, N> case. It is unclear for whether we will also add support for sycl::vec<bfloat16, N>: this is still an open question afaik: any input is welcomed: the thinking was we didn't want to add duplicates if the vec impl wasn't required, since the marray implementation should be equivalent but more general. Note also that Nvidia libraries use an analogue of sycl::marray<bfloat16, N>, an analogue doesn't exist in the CUDA runtime api. |
Hi, @JackAKirk |
So I think it would make sense for the sycl::ext::oneapi bfloat16 math extension (I think convert functions can also be covered in this extension: perhaps it should be renamed to bfloat16 builtins extension) to support all the functions that cuda supports. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @aelovikov-intel
Could you help take a look?
I agree with @JackAKirk 's concerns regarding duplicated functionality and would like to hear from @gmlueck as well.
Also, why do we have an extension implementation without the extension specification?
|
||
// Need to ensure that sycl bfloat16 defined in bfloat16.hpp is compatible | ||
// with uint16_t in layout. | ||
#if __cplusplus >= 201703L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to check this - C++17 is the minimal supported version.
There should definitely be an extension specification whenever we added extended APIs to SYCL. However, rather than creating a new extension, can we augment the existing sycl_ext_oneapi_bfloat16_math_functions? One thing that's not clear to me is whether the math functions in this PR are available on all devices, or if they are only available on Nvidia devices. I think the ones in sycl_ext_oneapi_bfloat16_math_functions are only available on Nvidia? We should also pay attention to the namespace. The functions in sycl_ext_oneapi_bfloat16_math_functions live in @jinge90 mentions above that we might decide to remove some of these math functions later. If this is our intention, the functions in this PR must be an "experimental" extension, and they must reside in the |
Currently the ones in sycl_ext_oneapi_bfloat16_math_functions are only implemented in Nvidia here: https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/ext/oneapi/experimental/bfloat16_math.hpp but as discussed below this was just because we used nvidia specific builtins for thoses cases.
The only contentious point is that we need to decide the naming convention for these functions as discussed in earlier messages. The intention was always that the bfloat16 functions that we have added so far would be implemented also on other backends (hence the oneapi namespace), via backend specific builtins, or a generic implementation. For the ext_oneapi_cuda we started with the bfloat16 functions that had nvptx builtins, and hence we had no reason to do generic software impls initially that other backends could also use, at least initially before they add their own builtins etc. We have already implemented all such cases where special Nvidia nvptx builtins exist. This means that in the ext_oneapi_cuda backend all new bfloat16 functions will just use a generic software implementation. |
} | ||
|
||
sycl::ext::oneapi::bfloat16 float2bfloat16(float f) { | ||
return __builtin_bit_cast(sycl::ext::oneapi::bfloat16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be recommended to use oneapi::detail::bitsToBfloat16
here and in other places instead of __builtin_bit_cast
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @JackAKirk
I didn't realize sycl bfloat16 has already had bitsToBfloat16, will update the code if we move the generic implementation to bfloat16_math.hpp
Thanks very much.
This makes sense to me. Is our plan, then, to change this PR to do this?
Some input from my side:
|
Hi, @gmlueck and @JackAKirk
For 1, the examples are hmax, hmin.... and for 2, the examples are type convert functions with different rounding mode. The reason why we implement some functions in libdevice is those functions's real implementation is more complicated and not suitable in a hpp file. Our bfloat16 utils aim to be tested and work in all platform which SYCL compiler supports(CPU, Intel GPU, ACC) except NV backend and we also had corresponding lit tests: intel/llvm-test-suite#1364
We can try to move max, min to blfoat16_math.hpp as first step, does the approach make sense to you? |
Why wouldn't we want these bfloat16 utils to work on NV backend?
If you are not in a hurry, I don't mind waiting until we have a better idea of the other math functions we want to add.
Why wouldn't we move all of the bfloat16 math functions to the sycl_ext_oneapi_bfloat16_math_functions extension? Isn't it better to have them all in one place? |
Hi, @gmlueck "Why wouldn't we want these bfloat16 utils to work on NV backend?" "Why wouldn't we move all of the bfloat16 math functions to the sycl_ext_oneapi_bfloat16_math_functions extension? Isn't it better to have them all in one place?" Thanks very much. |
You don't have to worry about this. It is true that in many cases we will eventually want impls that
However, this is not true for all cases (using a target specific builtin), and there is no reason to not let the Nvidia backend (or any other backend IMO) use generic implementations when they are added. We would prefer some not perfectly optimal implementation rather than no implementation. There was a similar situation here: #6038 We implemented marray math functions using the simplest solution from the set of optimal algorithms for the Nvidia backend, but we also switched on the functions for other backends: In the future these other backends could switch to a more optimal implementation for that backend if it exists.
I Imagine it is safe to assume that if there is already a given sycl builtin for float type, we will want to add the corresponding bfloat16 function in the bfloat16 extension. These are all very standard widely used math functions and I think it would be strange if they didn't have bfloat16 support. This is the case for all the "math" bfloat16 variants from the CUDA library. I think there would only be some argument for not adopting a builtin generally across backends in the case that it is for a very niche application. I am guessing this is going to be quite rare. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: jinge90 ge.jin@intel.com